---
title: FPDFText_GetText
description: Extract text from a PDF page into a buffer
searchable: true
---

# FPDFText_GetText

`FPDFText_GetText(text_page, start_index, count, result_buffer)`

## Description

Extracts text from a PDF page into a buffer. This function is used to get the actual text content from a page after creating a text page object with `FPDFText_LoadPage`. The extracted text is encoded in UTF-16LE format (2 bytes per character).

## Prerequisites

This example uses the `initializePdfium` helper function from our [Getting Started](/docs/pdfium/getting-started#initializing-pdfium) guide. Make sure to include this function in your code before trying the examples below.

## Parameters

| Name | Type | Description |
|------|------|-------------|
| text_page | number | A text page handle obtained from `FPDFText_LoadPage`. |
| start_index | number | The 0-based index of the first character to extract. |
| count | number | The number of characters to extract. |
| result_buffer | number | Pointer to a buffer to receive the text. Must be allocated with enough space to hold the requested text plus a null terminator. |

## Return Value

Returns the number of characters written to the buffer (including the terminating null character), or 0 on error. Common error cases include:

- The text_page handle is invalid
- The start_index is out of range
- The count is negative or extends beyond the available text
- The result_buffer is invalid or not large enough

## Example

```typescript
// Note: The initializePdfium function is a helper that initializes the PDFium library.
// For the full implementation, see: /docs/pdfium/getting-started
import { initializePdfium } from './initialize-pdfium';

async function extractTextFromPage(pdfData: Uint8Array, pageIndex: number): Promise<string> {
  // Initialize PDFium
  const pdfium = await initializePdfium();
  
  // Load the PDF document
  const filePtr = pdfium.pdfium.wasmExports.malloc(pdfData.length);
  pdfium.pdfium.HEAPU8.set(pdfData, filePtr);
  const docPtr = pdfium.FPDF_LoadMemDocument(filePtr, pdfData.length, 0);
  
  if (!docPtr) {
    const error = pdfium.FPDF_GetLastError();
    pdfium.pdfium.wasmExports.free(filePtr);
    throw new Error(`Failed to load PDF: ${error}`);
  }
  
  try {
    // Load the PDF page
    const pagePtr = pdfium.FPDF_LoadPage(docPtr, pageIndex);
    if (!pagePtr) {
      throw new Error(`Failed to load page ${pageIndex}`);
    }
    
    try {
      // Create a text page object
      const textPagePtr = pdfium.FPDFText_LoadPage(pagePtr);
      if (!textPagePtr) {
        throw new Error(`Failed to load text for page ${pageIndex}`);
      }
      
      try {
        // Get the character count
        const charCount = pdfium.FPDFText_CountChars(textPagePtr);
        if (charCount <= 0) {
          return ''; // No text on this page or error
        }
        
        // Allocate a buffer for the text (+1 for null terminator)
        const bufferSize = (charCount + 1) * 2; // UTF-16, 2 bytes per character
        const textBufferPtr = pdfium.pdfium.wasmExports.malloc(bufferSize);
        
        try {
          // Extract all text from the page
          const extractedLength = pdfium.FPDFText_GetText(
            textPagePtr,
            0,          // Start from first character
            charCount,  // Get all characters
            textBufferPtr
          );
          
          if (extractedLength === 0) {
            throw new Error('Failed to extract text from page');
          }
          
          // Convert the UTF-16LE text to a JavaScript string
          return pdfium.pdfium.UTF16ToString(textBufferPtr);
        } finally {
          // Clean up text buffer
          pdfium.pdfium.wasmExports.free(textBufferPtr);
        }
      } finally {
        // Clean up text page
        pdfium.FPDFText_ClosePage(textPagePtr);
      }
    } finally {
      // Clean up PDF page
      pdfium.FPDF_ClosePage(pagePtr);
    }
  } finally {
    // Clean up document
    pdfium.FPDF_CloseDocument(docPtr);
    pdfium.pdfium.wasmExports.free(filePtr);
  }
}

// Usage
fetch('sample.pdf')
  .then(response => response.arrayBuffer())
  .then(buffer => extractTextFromPage(new Uint8Array(buffer), 0))
  .then(text => console.log('Extracted text:', text))
  .catch(error => console.error('Error:', error));
```

## Usage Examples

### Extracting a specific range of text

```typescript
// Extract text from characters 10 to 20 (11 characters)
const startIndex = 10;
const count = 11;
const bufferSize = (count + 1) * 2; // +1 for null terminator, *2 for UTF-16
const textBufferPtr = pdfium.pdfium.wasmExports.malloc(bufferSize);

try {
  const extractedLength = pdfium.FPDFText_GetText(
    textPagePtr,
    startIndex,
    count,
    textBufferPtr
  );
  
  if (extractedLength > 0) {
    const text = pdfium.pdfium.UTF16ToString(textBufferPtr);
    console.log(`Extracted snippet: "${text}"`);
  }
} finally {
  pdfium.pdfium.wasmExports.free(textBufferPtr);
}
```

## Best Practices

1. **Proper buffer sizing**: Always allocate a buffer that's large enough to hold the requested text plus a null terminator. Each character in PDFium is UTF-16LE encoded, requiring 2 bytes per character.

2. **Check return value**: Always check if `FPDFText_GetText` returns 0, which indicates an error.

3. **Get character count first**: Before calling `FPDFText_GetText`, use `FPDFText_CountChars` to find out how many characters are available on the page.

4. **Free buffer memory**: Always free the memory allocated for the text buffer when you're done with it.

5. **Handle empty text gracefully**: Some pages might not contain any text, so your code should handle empty strings gracefully.

## Common Issues

1. **Buffer overflow**: If the buffer is not large enough to hold the requested text plus a null terminator, `FPDFText_GetText` may fail or cause memory corruption.

2. **Invalid range**: If `start_index` is negative or `start_index + count` exceeds the number of characters on the page, the function will fail.

3. **Character encoding**: The text is returned in UTF-16LE encoding, not UTF-8. Make sure to convert it properly if your application expects UTF-8.

4. **Memory leaks**: Always free the memory allocated for the text buffer when you're done with it.

## Related Functions

- [FPDFText_LoadPage](/docs/pdfium/functions/FPDFText_LoadPage) - Load a page for text extraction
- [FPDFText_CountChars](/docs/pdfium/functions/FPDFText_CountChars) - Get the number of characters on a page
- [FPDFText_ClosePage](/docs/pdfium/functions/FPDFText_ClosePage) - Close a text page and release resources 